Incremental Haplotype Inference, Phylogeny, and Almost Bipartite Graphs∗
نویسنده
چکیده
We address the combinatorial problem of inferring haplotypes in a population that forms a perfect phylogeny (PP) given a sample of genotypes. The problem is relevant because, in DNA sequencing, genotypes are easier to obtain than haplotyping by DNA sequencing. Since PP’s appear naturally and frequently on DNA sequences of restricted length, PP haplotyping is a favourable approach to facilitate reliable haplotype inference. Since Gusfield’s seminal paper from 2002, a number of different algorithms have been proposed. Here we give an algorithm that identifies haplotypes incrementally (along the sequence). Under the random mating assumption, all sufficiently frequent haplotypes are inferred from a random genotype sample of asymptotically optimal size. By its extreme simplicity, the idea of the algorithm easily extends to more general population structures. This can be beneficial because the strict PP assumption is easily violated in reality. Missing data can also be recovered by incremental haplotyping, if they are not too prevalent. In a more graph-theoretic part of this work we solve a problem we call almost-2-coloring of graphs, which arises in an enhanced version of our haplotyping algorithm. We show that the solution space of this graph problem can be computed in
منابع مشابه
Haplotype Block Partitioning and tagSNP Selection under the Perfect Phylogeny Model
Single Nucleotide Polymorphisms (SNPs) are the most usual form of polymorphism in human genome.Analyses of genetic variations have revealed that individual genomes share common SNP-haplotypes. Theparticular pattern of these common variations forms a block-like structure on human genome. In this work,we develop a new method based on the Perfect Phylogeny Model to identify haplo...
متن کاملMETA-HEURISTIC ALGORITHMS FOR MINIMIZING THE NUMBER OF CROSSING OF COMPLETE GRAPHS AND COMPLETE BIPARTITE GRAPHS
The minimum crossing number problem is among the oldest and most fundamental problems arising in the area of automatic graph drawing. In this paper, eight population-based meta-heuristic algorithms are utilized to tackle the minimum crossing number problem for two special types of graphs, namely complete graphs and complete bipartite graphs. A 2-page book drawing representation is employed for ...
متن کاملBalanced Degree-Magic Labelings of Complete Bipartite Graphs under Binary Operations
A graph is called supermagic if there is a labeling of edges where the edges are labeled with consecutive distinct positive integers such that the sum of the labels of all edges incident with any vertex is constant. A graph G is called degree-magic if there is a labeling of the edges by integers 1, 2, ..., |E(G)| such that the sum of the labels of the edges incident with any vertex v is equal t...
متن کاملLinear Reduction for Haplotype Inference
Haplotype inference problem asks for a set of haplotypes explaining a given set of genotypes. Popular software tools for haplotype inference (e.g., PHASE, HAPLOTYPER) as well as new algorithms recently proposed for perfect phylogeny inference (DPPH) are often not well scalable. When the number of sites (SNP’s) comes to thousands these tools often cannot deliver answer in reasonable time even if...
متن کاملToward an algebraic understanding of haplotype inference by pure parsimony.
Haplotype inference by pure parsimony (HIPP) is known to be NP-Hard. Despite this, many algorithms successfully solve HIPP instances on simulated and real data. In this paper, we explore the connection between algebraic rank and the HIPP problem, to help identify easy and hard instances of the problem. The rank of the input matrix is known to be a lower bound on the size an optimal HIPP solutio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004